Assessing standardized contrast effects in ANCOVA: Confidence intervals, precision evaluations, and sample size requirements

Standardized effect sizes and confidence intervals are useful statistical assessments for comparing results across different studies when measurement units are not directly comparable. This paper aims to describe and compare confidence interval estimation methods for the standardized contrasts of treatment effects in ANCOVA designs. Sample size procedures are also presented to assure that the resulting confidence intervals yield informative estimation with adequate precision. Exact interval estimation approach has theoretical and empirical advantages in coverage probability and interval width over the approximate interval procedures. Numerical investigations of the existing method reveal that the omission of covariate variables has a negative impact on sample size calculations for precise interval estimation, especially when there is disparity in influential covariate variables. The proposed approaches and developed computer programs fully utilize covariate properties in interval estimation and provide accurate sample size determinations under the precision considerations of the expected interval width and the assurance probability of interval width.


Introduction
The utility of effect sizes and confidence intervals has been strongly emphasized in several editorial guidelines and methodological implications. The standardized mean difference between two independent populations is the most frequently used effect size measure across virtually all disciplines of scientific researches. Accordingly, the intuitive formula Hedges's [1] g or commonly known as of Cohen's [2] d is an estimate of the population standardized mean difference and is defined as the difference between two sample means divided by their pooled sample standard deviation under homoscedasticity. Unlike the unstandardized contrasts, the effect size reporting and interpretation practices suggest that the standardized effect sizes are useful when comparing results from multiple studies using measurement instruments whose raw units are not directly comparable. Comprehensive expositions and practical uses about standardized mean difference and related effect size measures are available in Grissom and Kim [3], Kline [4], Lin and Aloe [5], Takeshima et al. [6], Zhang [7], Zhang and Heyse [8], and the references therein. Considerable attention has also been drawn toward the development of interval estimation methods for standardized mean difference, see Odgaard and Fowler [9], Smithson [10], Steiger and Fouladi [11], Tian [12], Wu, Jiang and Wei [13], and Zou [14], among others. Standardized contrasts of treatment effects and interval procedures are presented in Kline [4], Steiger and Fouladi [11], and Steiger [15] in the context of ANOVA. Moreover, Grissom and Kim [3], Kline [4], Levin [16], and Olejnik and Algina [17] addressed the calculation and interpretation of standardized contrasts of treatment effects in ANCOVA. However, these studies did not discuss the issues of interval estimation of standardized treatment contrasts. The exceptional case of Lai and Kelley [18] presented confidence interval formula and sample size determination of standardized contrast effects for ANCOVA designs with only one covariate. Particularly, Lai and Kelley [18] focused on the special cases of ANCOVA with a single covariate and suggested that, under randomized designs, the linear contrast of covariate means is usually close to zero. Hence, the corresponding covariate quantity can be omitted in the sample variance of a linear contrast for confidence interval and sample size calculations of standardized treatment contrasts. Despite the conceptual argument and technical simplification, they did not conduct empirical study to justify the suggested procedures regarding the influence of omitted covariate variable. Hence, it is of methodological importance to perform detailed numerical examination for clarifying the adequacy of the approximate confidence interval and sample size methods. Moreover, their model setting did not cover the more involved situations with more than one covariate variable. Thus, it is of practical importance to present more accurate interval estimation and sample size procedures for ANCOVA studies with possibly diverse covariate configurations.
Although randomized experiments are preferred in the gold standard of research designs, it is still possible that randomized experiments may generates imbalanced groups in terms of sample size and baseline disparities even in large studies. The related results about the impact of covariate imbalance on power of ANCOVA studies and randomized clinical trials can be found in Ciolino et al. [19,20], Egbewale, Lewis and Sim [21], Kahan and Morris [22] and the references therein. More importantly, feasible power and sample size procedures were developed in Shieh [23][24][25] and Tang [26][27][28] to accommodate covariate randomness and imbalance for randomized and nonrandomized designs. It should be emphasized that a thorough and rigorous illustration of interval estimation, precision assessment, and sample size techniques for standardized contrast effects of ANCOVA has not been given in the literature.
In view of the limited findings in the literature, the goal of the current article is twofold. First, this research examines and compares exact and approximate confidence interval procedures of standardized contrast effects for ANCOVA designs with one or more covariate variables. A second aim of the present study is to extend and complement the demonstration of unstandardized contrast effects in Shieh [29] by developing exact precision and sample size procedures for standardized contrast effects under the assumption of multinormal covariate variables. Extensive numerical studies for interval estimation, precision assessment, sample size determination were conducted to illustrate the advantages of the proposed techniques over the approximate methods. In order to facilitate data analysis and design planning, computer algorithms are presented to resolve the computational issues of the suggested procedures.

Interval estimation procedures
Consider the fixed-effects ANCOVA model with G treatment groups and P multiple covariates: where Y ij is the score of the jth subject in the ith treatment group on the response variable, μ i is the ith intercept, X kij is the score of the jth subject in the ith treatment group on the kth covariate, β k is the slope coefficient of the kth covariate, and � ij is the independent N(0, σ 2 ) error with i = 1,. . ., G (� 2), j = 1, . . ., N i , and k = 1, . . ., P (� 1). In view of the importance and utility advocated in Grissom and Kim [4], Kline [5], and Levin [18], the current study focuses on the appraisals of standardized contrasts rather than the conventional or unstandardized contrasts. A standardized linear contrast of adjusted group means fm � 1 ; . . . ; m � G g is defined as Note that a linear contrast of adjusted group means is equivalent to a linear contrast of intercept parameters c ¼ The standard results in Rencher and Schaalje [30] show that the least square estimator for the ith intercept μ i is given bym Moreover, an unbiased estimator of the adjusted group mean m � i is of the form Thus, a convenient estimatorĉ for the contrast effect ψ iŝ It can be shown that the linear contrast estimator has the distribution where t 2 ¼ Varĉ . Note that the magnitude Q represents an estimate of the degree of covariate disparity in terms of the linear contrast of standardized covariate means.
The confidence interval of the contrast effect ψ can be obtained by the standard transformation T and associated distribution: XX S XY , and ν = N T − G − P, and t(ν) is the t distribution with degrees of freedom ν. Specifically, the 100(1 -α)% equal-tail two-sided confidence limits (ĉ L ;ĉ U ) of the contrast effect ψ arê c L ¼ĉ À t n;a=2t andĉ U ¼ĉ þ t n;a=2t ; ð8Þ where t ν,α/2 is the upper 100(α/2) percentile of the t(ν) distribution.
The functional relationship between the unstandardized and standardized contrast effects ψ and ψ � = ψ/σ suggests approximate confidence interval of ψ � can be immediately obtained by dividing the confidence limits {ĉ L ;ĉ U } of ψ by the standard error estimateŝ. This direct division gives a approximate 100 The identical procedure was proposed in Bird [35] and Fidler and Thompson [36] for finding the confidence interval of δ from the confidence limits of mean difference μ E − μ C , and in Kline [4] for obtaining confidence interval of standardized treatment contrast from the interval endpoints of treatment contrast in ANOVA.
It should be noted that the exact two-sided interval estimates {ĉ � EL ;ĉ � EU } are not equidistant from the standardized contrast estimateĉ � except for the special caseĉ � ¼ 0. However, the 1=2 and H D = t ν,α/2 V 1/2 , respectively. Note that the difference between z α/2 and t ν,α/2 is usually trivial for large ν. Hence, it is generally true that H D � H A and the interval {ĉ � DL ;ĉ � DU } is practically within the interval {ĉ � AL ;ĉ � AU } even when the estimateĉ � is nearly zero. To contribute to the understanding of interval estimation of the standardized contrast effects, numerical appraisals are presented next to investigate the estimation behavior of the three different confidence intervals.

An example
Fleiss [37, Section 7.2] described a study comparing three methods of treating the learning disabilities of children with respiratory diseases. The response variable is the number of correct answers to a test with 15 questions, and the covariate variable is the number of correct answers to a similar test with 7 questions. It is noted that the proportions of correct answers were not close to zero or one. Hence, no further transformation was applied to transform the discrete data for satisfying the normal assumption. The adjusted mean estimates and group sample sizes are fm � 1 ;m � 2 ;m � 3 g ¼ 5:7408; 8:5512; 7:8950 f g and {N 1 , N 2 , N 3 } = {19, 20, 20}, respectively. Also, the covariate coefficient estimate isb ¼ 1:2946 and the sample variance iŝ s 2 ¼ 3:2728. The omnibus F test of no treatment effect is F � = 12.52 with a p-value < 0.0001. Accordingly, the null hypothesis of treatment effects is rejected and there exists statistically significant differences among the three treatments at α = 0.05. For illustration, standardized contrast analysis is presented here to complement the conventional unstandardized contrast assessments. With {c 1 , c 2 , c 3 } = {-1, 1/2, 1/2}, the unstandardized and standardized contrast estimates areĉ ¼ 2:4853 andĉ � ¼ 1:3721, respectively. It can be shown that the linear contrast of standardized covariate means or the covariate disparity index is Q = 0.2131 and the factor V of the variance of the linear contrast is V = 0.0814. The 90% confidence intervals of the exact approach, asymptotic method, and direct division are

Empirical comparisons
To provide a more thorough evaluation, simulation studies were performed to examine the coverage performance of the interval procedures. On the other hand, the more involved precision assessments and sample size determinations of standardized contrast effects are presented in the Results section. For ease of explication, the model and parameter configurations of the learning disabilities study are modified and extended in this numerical examination. Specifically, Monte Carlo simulation studies of 10,000 iterations were performed with the normal distribution of the linear contrastĉ � Nðc; t 2 Þ and the chi-square distribution of the model Note that the variance of a linear contrast has the simple form Accordingly, the error variance estimate and covariate statistic of the learning disabilities data are set as the variance component σ 2 = 3.2728 and covariate disparity Q = 0.2131. The contrast effect ψ for the contrast coefficients {c 1 , c 2 , c 3 } = {-1, 1/2, 1/2} has four different magnitudes so that ψ = ψ � = ψ/σ = 0, 1, 2, and 3. Also, the designated balanced designs with G = 3 have the group sample size N = 5, 10, and 20. For each replicate, the lower and upper confidence limits {ĉ � EL ;ĉ � EU g; fĉ � AL ;ĉ � AU } and {ĉ � DL ;ĉ � DU } were computed for the 95% and 97.5% one-sided confidence intervals. These interval limits are then combined to construct the 90% and 95% equal-tail two-sided confidence intervals. The simulated coverage probability was the proportion of the 10,000 replicates whose confidence interval contained the population standardized contrast effect. The adequacy of the one-and twosided interval procedures is determined by the error between the simulated coverage probability and the nominal coverage probability. The results of the 12 combinations of total sample size and standardized contrast effect are summarized in Tables 1 and 2 for the two-sided confidence coefficient 1 -α = 0.90 and 0.95, respectively.
The simulated coverage performance in Tables 1 and 2 suggest that the exact approach is excellent in attaining the nominal confidence levels for all one-and two-sided situations. Although the sample size configurations are not large, the asymptotic method provides surprisingly accurate results and is only slightly inferior to the exact procedure. Note that the small discrepancies of the exact and asymptotic procedures are generally within the 95% simulation variability of 0.006, 0.004, and 0.003 for the nominal coverage probability of 0.90, 0.95 and 0.975, respectively. However, the simulated coverage rates of the direct division are systematically lower than the nominal level, especially for large ψ � > 1. The exact interval procedure generally yields asymmetric confidence intervals for ψ � . Consequently, the equidistant confidence limits of the direct division technique tend to have undesirable effects. For the model configurations examined here, the lower confidence limits of the one-sided upper 95% and 97.5% confidence intervals are substantially less accurate than the counterparts or upper confidence limits of the one-sided lower 95% and 97.5% confidence intervals. The only exceptions that the direct division gives acceptable coverage performance are those associated with ψ � = 0. Note that the three confidence intervals for the standardized contrast effect are asymptotically equivalent. However, when the magnitude is not zero, the direct method is substantially less accurate than the other two procedures even for large sample sizes. In view of these findings, the exact confidence interval estimation is recommended. When no statistical software is available, the asymptotic method provides a viable alternative for hand computation. However, the direct division procedure is too simple to be useful and is not appropriate for general applications. Additional assessments were conducted to demonstrate the intrinsic implications and the results suggest the same performance situations as reported here.

Precision assessments and sample size determinations
Within the context of ANCOVA, Hochberg and Varon-Salomon [38] showed that the conditional (on the covariates) confidence interval procedure compares favorably with the unconditional method based on the joint distribution of the response and covariate variables. Thus, the inferential procedures of hypothesis testing and interval estimation are the same under both fixed and random formulations. The distinction between the two modeling setups, however, becomes crucial when testing power, coverage probability, and corresponding sample size calculations are to be made. To exhibit the unique and distinct precision characteristics of confidence intervals, two useful criteria have been considered in Kupper and Hafner [39] for oneand two-sample problems. The suggested precision assessments maintain the expected magnitude of interval widths and the assurance probability of interval widths within a pre-assigned threshold.

Precision assessments
When planning and conducting ANCOVA research, the actual values of the continuous measurements of response and covariate variables for each subject are available only after the observations are obtained. In addition to the randomness of normal responses, the stochastic nature of covariate variables has to be taken into account in precision analysis under the general and unconditional context of ANCOVA. It is convenient and useful to consider the where μ Xi is a P × 1 vector and S X is a P × P positive-definite variance-covariance matrix for i = 1,. . ., G, and j = 1, . . ., N i . The quantity V in Varĉ where F(P, ν + 1, ξ) is the noncentral F distribution with degrees of freedom P and ν + 1 and noncentrality parameter It is noted in Anderson [40] that for large samples the distribution of F X given by Eq 18 is approximately valid even if the parent distribution of covariates is not normal. In addition, the robust features of the F and Hotelling's statistics have been examined empirically and theoretically in Chase and Bulgren [42], Everitt [43], Holloway and Dunn [44], Hopkins and Clay [45], and Kariya [46,47]. Hence, the noncentral F distribution provides a robust approximation for general use.
A useful population covariate disparity index can be defined as However, it is a common assumption for a randomized design that the covariate characteristics are identical for all treatment groups with μ X1 = . . . = μ XG = μ X , and the noncentrality parameter and covariate disparity reduce to ξ = θ = 0. Accordingly, the scaled quantity f 2� = ξ/N T = θ/(aN T ) = θ/a � has a similar role as the signal to noise ratio f 2 in ANOVA where a � ¼ P G i¼1 c 2 i =p i and p i = N i /N T for i = 1, . . ., G. Thus, the conventional benchmark of Cohen [2] for small, medium, and large effects of f = 0.10, 0.25, and 0.40 may serve as a general guideline for the magnitude of covariate disparity when no better basis is available. However, the merit of a specific size of disparity should be assessed and interpreted by directly comparing with the findings in related prior studies.
In order to accommodate the stochastic properties of the covariate and response variables through the joint distribution of F X and T � , the expected width is evaluated for the interval where E F [�] and E T [�] are taken with respect to the noncentral F distribution of F X and the noncentral t distribution of T � , respectively. An alternative approach to precision appraisal of confidence intervals is the assurance probability of interval width W is not wider than the designated bound ω (> 0): where G = 1 if W � ω and G = 0 if W > ω.

Sample size determinations
In planning research design, optimal sample size determinations need to be conducted to achieve the designated precision requirement in interval estimation. For the current problem of precise interval estimation of standardized contrasts, it is desirable to determine the minimal sample sizes such that the expected width of a 100(1α)% confidence interval is within the selected bound ω (> 0): Also, one may compute the minimal sample sizes needed to guarantee, with a given assurance probability 1γ (< 1), that the interval width of a 100(1α)% confidence interval will not exceed the chosen threshold ω: Note that Lai and Kelley [18] presented approximate sample size formulas for interval estimation of standardized contrasts for balanced ANCOVA with one covariate (N i = N and P = 1). They assumed that, under a randomized design, the linear contrast of covariate means P G i¼1 c i � X i is usually close to zero and the sample variance of a linear contrast can be approximated as τ 2 ¼ : aσ 2 . Extending such a simplification to the general case of ANCOVA designs with one or more covariates, the distribution of T � ¼ĉ=t given in Eq 10 is simplified as where λ a = ψ � /a 1/2 . The expected width and assurance probability of interval widths are modified as respectively, where E Ta [�] is taken with respect to the approximate noncentral t distribution t(ν, λ a ) of T � . These two simplified precision functions O a and Γ a provide alternative sample size calculations for standardized contrast analysis. For comparative purposes, these approximate methods are also investigated to reveal the impact of omitted covariate effects on sample size determinations.

Numerical illustrations
The underlying behavior and relative performance of the exact and approximate procedures for precision and sample size calculations are demonstrated in the following empirical study. First, the minimum sample sizes for the expected width and assurance probability criteria are computed by the exact and approximate distribution functions under the designated model configurations. Second, the accuracy of the precision and sample size procedures is examined through Monte Carlo simulation study to clarify the influence of covariate features under a wide range of model scenarios. Accordingly, the balanced ANCOVA models with three treatment groups G = 3, and one to five (P = 1, . . ., 5) normal covariate variables are used as the basis for numerical assessments. The model configurations have the designated settings: the intercept parameters {μ 1 , μ 2 , μ 3 } = {0.5, 0, 0}, contrast coefficients {c 1 , c 2 , c 3 } = {1, -0.5, -0.5}, and error variance σ 2 = 1. Note that the resulting standardized contrast effect is ψ � = 0.5. Because the covariate distribution function F X depends on the group mean values and variance-covariance matrix of the multinormal covariate distributions through the covariate disparity index θ. Without loss of generality, the corresponding mean values are set as μ X1 = {θ 1/2 , 0, . . ., 0}, μ X2 = μ X3 = 0 where 0 is a P × 1 null column vector, and S X = I P is the identity matrix of dimension P. Three different magnitudes of covariate disparity are considered: θ = 0, 0.25 and 0.50 to represent potential covariate characteristics in randomized and nonrandomized designs. Throughout this numerical demonstration, only balanced designs with sample size ratios {r 1 , r 2 , r 3 } = {1, 1, 1} are considered and the confidence level is fixed as 1 -α = 0.95. The computed total sample sizes for homogeneous covariates θ = 0 with the expected width threshold ω = 1.00, 1.25 and 1.50, and P = 1, . . ., 5 are presented in Table 3, while the corresponding total sample sizes associated with θ = 0.25 and 0.50 are summarized in Tables 4 and 5, respectively. The optimal total sample sizes for assurance probability 1γ = 0.80 with ω = 1.00, 1.25 and 1.50 of bounded confidence intervals are summarized in Tables 6-8 for θ = 0, 0.25 and 0.50, respectively. Table 3. Computed sample size, estimated expected width, and simulated expected width for 95% confidence interval of standardized contrast when the number of groups G = 3, standardized contrast effect ψ � = 0.5, and covariate disparity θ = 0.

The proposed approach
The approximate method  Table 4. Computed sample size, estimated expected width, and simulated expected width for 95% confidence interval of standardized contrast when the number of groups G = 3, standardized contrast effect ψ � = 0.5, and covariate disparity θ = 0.25.

The proposed approach
The approximate method It can be seen from the results of the proposed approach in Tables 3-8 that larger sample sizes are required for smaller expected width threshold ω and the results for ω = 1 are almost twice as those of ω = 1.5 in the same table. Also, the assurance probability principle demands larger sample sizes than the expected width criterion for the settings examined here. It is expected that the necessary sample sizes will increase for higher assurance probability levels 1γ > 0.80. The computed sample sizes of the approximate method also have similar patterns with respect to the width threshold and precision criteria. However, because the omission of Table 5. Computed sample size, estimated expected width, and simulated expected width for 95% confidence interval of standardized contrast when the number of groups G = 3, standardized contrast effect ψ � = 0.5, and covariate disparity θ = 0.50.

The proposed approach
The approximate method  Table 6. Computed sample size, estimated assurance probability, and simulated assurance probability for 95% confidence interval of standardized contrast when the number of groups G = 3, standardized contrast effect ψ � = 0.5, assurance probability 1 -γ = 0.80, and covariate disparity θ = 0. covariate features, the computed sample sizes do not vary with the number of covariates P and the covariate disparity θ. The estimated expected width and estimated assurance probability of the exact and approximate procedures are also listed in Tables 3-8. In the second stage, simulated values of expected width and assurance probability associated with the reported sample sizes and selected parameter configurations are computed through a Monte Carlo study of 10,000 independent data sets. For each replicate, N T sets of Table 8. Computed sample size, estimated assurance probability, and simulated assurance probability for 95% confidence interval of standardized contrast when the number of groups G = 3, standardized contrast effect ψ � = 0.5, assurance probability 1 -γ = 0.80, and covariate disparity θ = 0.50.

The proposed approach
The approximate method  Table 7. Computed sample size, estimated assurance probability, and simulated assurance probability for 95% confidence interval of standardized contrast when the number of groups G = 3, standardized contrast effect ψ � = 0.5, assurance probability 1 -γ = 0.80, and covariate disparity θ = 0.25.

The proposed approach
The approximate method covariate values are generated from the prescribed multinormal distribution. These values of covariates, treatment effects, and error variance σ 2 , in turn, determine the mean responses for generating N T normal outcomes of the ANCOVA model. Because the covariate coefficients are nuisance parameters, they are set as β 1 = . . . = β P = 0.5 in the precision analysis. According to the differences between the estimated and simulated precision quantities of O and Γ in Tables 3-8, the exact approach results in outstanding performance for all 90 cases. In contrast, the approximate method does not provide accurate sample size calculations for most of the cases. Note that the simplified precision assessments O a and Γ a are presumed valid under the assumption of randomized designs, such as the situations with covariate disparity θ = 0 in Tables 3 and 6. However, the estimated expected width O a is consistently less than the simulated expected width for all the cases in Table 3. The results in Table 6 reveal that the assurance probability calculation Γ a is substantially larger than the simulated assurance probability. Therefore, the approximate method generally underestimates the sample sizes and the deficiency phenomenon is more prominent and noticeably problematic for the cases with covariate disparity θ = 0.25 and 0.50 in Tables 4, 5, 7 and 8. Consequently, the usefulness of the approximate procedures is extremely limited and the exact techniques are recommended for precision assessments and sample size determinations. Additional numerical assessments were also conducted for non-normal error t (10) and non-normal covariates: Exponential (1) The corresponding specifications of R programs for expected width and assurance probability are presented in supplemental files E and F, respectively: ancova.scie.apx2.fun(alpha=0.05, g = 3, p = 1, sigsq = 1, rvec = c(1,1,1), cvec = c(1,-0.5,-0.5), psis = 0.5, theta = 0, omega = 1.5) and ancova.scie.apx3.fun(alpha = 0.05, g = 3, p = 1, sigsq = 1, rvec = c(1,1,1), cvec = c(1,-0.5,-0.5), psis = 0.5, theta = 0, omega = 1.5, ap = 0.8).
It should be emphasized that the analytic complexity requires computer algorithms to compute the exact confidence intervals and optimal sample sizes under various design configurations. The developed computer programs substantially facilitate the recommended confidence interval and sample size techniques in practical applications. Users can easily alter the exemplifying settings in the programs with their designated model specifications.

Conclusions
Measures of effect size and confidence intervals are extremely useful for comparing quantitative information across different studies. This research describes and compares exact and approximate confidence intervals for standardized contrast effects in ANCOVA. The theoretical and numerical findings show that the exact approach has analytic support and empirical advantage over the other two approximate methods using asymptotic theory and direct division. The advanced aspects of precision appraisals and sample size calculations are also investigated for interval estimation of standardized contrasts. The existing study provided a shortcut under the speculation that covariate disparity is unlikely to exist in randomized designs. However, the approximate technique actually does not give reliable precision and sample size outcomes even when there is no covariate disparity. The proposed exact approach has the distinct utility of accommodating the full distributional information of normal covariates, especially the potential disparity of unequal means for both random and nonrandom ANCOVA designs. Overall, the described results of interval estimation, precision assessment, and sample size planning for standardized contrasts update and expand upon current work of ANCOVA in the literature.